First Steps towards Multi-Engine Machine Translation

نویسنده

Andreas Eisele

چکیده

We motivate our contribution to the shared MT task as a first step towards an integrated architecture that combines advantages of statistical and knowledge-based approaches. Translations were generated using the Pharaoh decoder with tables derived from the provided alignments for all four languages, and for three of them using web-based and locally installed commercial systems. We then applied statistical and heuristic algorithms to select the most promising translation out of each set of candidates obtained from a source sentence. Results and possible refinements are discussed. 1 Motivation and Long-term Perspective ”The problem of robust, efficient and reliable speech-to-speech translation can only be cracked by the combined muscle of deep and shallow processing approaches.” (Wahlster, 2001) Although this statement has been coined in the context of VerbMobil, aiming at translation for direct communication, it appears also realistic for many other translation scenarios, where demands on robustness, coverage, or adaptability on the input side and quality on the output side go beyond today’s technological possibilities. The increasing availability of MT engines and the need for better quality has motivated considerable efforts to combine multiple engines into one “super-engine” that is hopefully better than any of its ingredients, an idea pionieered in (Frederking and Nirenburg, 1994). So far, the larger group of related publications has focused on the task of selecting, from a set of translation candidates obtained from different engines, one translation that looks most promising (Tidhar and Küssner, 2000; Akiba et al., 2001; Callison-Burch and Flournoy, 2001; Akiba et al., 2002; Nomoto, 2004). But also the more challenging problem of decomposing the candidates and re-assembling from the pieces a new sentence, hopefully better than any of the given inputs, has recently gained considerable attention (Rayner and Carter, 1997; Hogan and Frederking, 1998; Bangalore et al., 2001; Jayaraman and Lavie, 2005). Although statistical MT approaches currently come out as winners in most comparative evaluations, it is clear that the achievable quality of methods relying purely on lookup of fixed phrases will be limited by the simple fact that for any given combination of topic, application scenario, language pair, and text style there will never be sufficient amounts of pre-existing translations to satisfy the needs of purely data-driven approaches. Rule-based approaches can exploit the effort that goes into single entries in their knowledge repositories in a broader way, as these entries can be unfolded, via rule applications, into large numbers of possible usages. However, this increased generality comes at significant costs for the acquisition of the required knowledge, which needs to be encoded by specialists in formalisms requiring extensive training to be used. In order to push the limits of today’s MT technology, integrative approaches will have to be developed that combine the relative advantages of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Engine Machine Translation with an Open-Source Decoder for Statistical Machine Translation

We describe an architecture that allows to combine statistical machine translation (SMT) with rule-based machine translation (RBMT) in a multi-engine setup. We use a variant of standard SMT technology to align translations from one or more RBMT systems with the source text. We incorporate phrases extracted from these alignments into the phrase table of the SMT system and use the open-source dec...

متن کامل

Multi-Engine Machine Translation with an Open-Source SMT Decoder

متن کامل

Are We There Yet?

Statistical approaches to Artificial Intelligence are behind most success stories of the field in the past decade. The idea of generating non-trivial behaviour by analysing vast amounts of data has enabled recommendation systems, search engines, spam filters, optical character recognition, machine translation and speech recognition, among other things. As we celebrate the spectacular achievemen...

متن کامل

Pipelined Multi-Engine Machine Translation: Accomplishment of MATES/CK System

In this paper, we propose a new pipelined multi-engine approach to machine translation, which can take advantage of the previously proposed methods, such as rule-based, example-based, pattern-based and statistics-based methods, and eliminate their disadvantages. Some key new techniques in the multi-engine approach, including attribute knowledge classifications, statistical decision-making, patt...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Predictive Models of Performance in Multi-Engine Machine Translation

The paper describes a novel approach to Multi-Engine Machine Translation. We build statistical models of performance of translations and use them to guide us in combining and selecting from outputs from multiple MT engines. We empirically demonstrate that the MEMT system based on the models outperforms any of its component engine.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

First Steps towards Multi-Engine Machine Translation

نویسنده

چکیده

منابع مشابه

Multi-Engine Machine Translation with an Open-Source Decoder for Statistical Machine Translation

Multi-Engine Machine Translation with an Open-Source SMT Decoder

Are We There Yet?

Pipelined Multi-Engine Machine Translation: Accomplishment of MATES/CK System

A new model for persian multi-part words edition based on statistical machine translation

Predictive Models of Performance in Multi-Engine Machine Translation

عنوان ژورنال:

اشتراک گذاری